電腦系統：程式設計師的觀點（全球版）：執行之路：理解編譯器驅動程式

指揮者：編譯器驅動程式

將 編譯器驅動程式 （如 GCC）視為一位宏偉的指揮家。它自動化從人類可讀的原始碼轉換為二進位可執行檔的複雜過程。這段旅程，即 執行之路，始於 編譯時間 並延伸至 載入時間 與 執行時間。

透過使用 獨立編譯，驅動程式會分別處理 main.c 與 sum.c 。一個模組的變更不需要重新翻譯整個專案——僅需將修改過的檔案經過前置處理器（cpp）、編譯器（cc1）及組譯器（as），再交由 連結器 （ld）整合產生的 可重定位目標檔案。

效率與記憶體階層

連結器對於 grid[0][0] 或 src[0][0] 直接影響 吞吐量 與延遲。透過將資料對齊至 32 位元組快取行，驅動程式促成了 步距為 1 的參考模式，最小化 冷缺失 並避免 欄位掃描所導致的快取剔除。在高階高效能程式碼中， 迴圈展開平行性（$4 \times 4$ 展開迴圈） 進一步隱藏 主記憶體到快取的對映 延遲，透過優化時脈頻率週期（0x32、0x1、0x4、0x51）來達成。

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

Which component of the compiler driver is responsible for generating the assembly file (/tmp/main.s)?

The preprocessor (cpp)

The compiler (cc1)

The assembler (as)

The linker (ld)

QUESTION 2

What is a primary benefit of 'Separate Compilation'?

It makes the final executable run faster.

It allows modifications to one file without re-translating others.

It automatically unrolls all loops to 4x4.

It eliminates the need for a linker.

QUESTION 3

How does a Stride-1 reference pattern affect the L1 cache?

It causes column-wise scan evictions.

It maximizes hit rates by utilizing spatial locality.

It bypasses the cache to reduce latency.

It increases the number of cold misses to 100%.

QUESTION 4

What happens at 0x064C if the linker places a multi-byte integer across a 32-byte cache boundary?

The compiler driver automatically fixes it at run time.

The L1 cache throughput is maximized.

A potential drop in hit rates and increased latency occurs.

The assembler produces a relocatable error.

QUESTION 5

The hex representations 0x32, 0x1, 0x4, and 0x51 in the theory likely represent:

The binary tags for the L2 cache.

Clock frequency stalls or memory fetch latencies.

The sequence of registers used in a 4x4 unroll.

The static library identifiers.